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This expository note describes how to apply the method of maximum likelihood to estimate the 
parameters of the "g-exponential" distributions introduced by Tsallis and collaborators. It also 
describes the relationship of these distributions to the classical Pareto distributions. 
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In a series of papers beginning with [l|, Constantino 
Tsallis and collaborators introduced what have come to 
be called q-exponential probability distributions. These 
can be defined through their "complementary" ( "upper" , 
"upper cumulative") distribution functions, also called 
"survival" functions: 



PgA X >X)=[1- 



(l-q)x 



1/(1-9) 



(1) 



Tsallis et al. proposed these distributions to handle 
statistical-mechanical systems with long-range interac- 
tions, necessitating (it is claimed) a non-extensive gen- 
eralization of the ordinary Gibbs-Shannon entropy. Fol- 
lowing Jaynes's procedure of maximizing an entropy sub- 
ject to constraints on expectation values Q , they got the 
g-exponential distributions, in which k enforces the con- 
straints, and q measures the departure from extensivity, 
Boltzmann-Gibbs statistics being recovered as q — » 1. 

Tsallis's ideas about non-extensive entropy and its 
possible applications, in and out of statistical me- 
chanics, have attracted intense (not to say "ex- 
tensive") interest in physics; the bibliography at 
http://tsallis.cat.cbpf.br/biblioThtM has over 
2000 entries. They are also quite controversial (see, e.g., 
Refs. @, i, i, @, 0, S|, the replies by Tsallis and others, 
and in some cases the replies to the replies). Whether 
or not the critics are correct, however, q-exponentials are 
still valid probability distributions, and can usefully de- 
scribe some empirical phenomena. To this end, in a re- 
cent paper Douglas R. White et al. pose the problem 
of estimating the parameters q and k from data by the 
method of maximum likelihood . This note solves that 
problem. 

I first reparameterize Eq. [1] to simplify estimation and 
emphasize links to Pareto distributions. I then rehearse 
the math of finding the maximum likelihood estima- 
tor (MLE) for the q-exponential distribution, discussing 
its accuracy and precision, and adjustments for data in 
which samples below a fixed threshold are all dropped 
("censoring"). I compare maximum-likelihood estimates 
to those found by the current practice of curve-fitting; the 
latter are inferior. Finally, I discuss testing the assump- 
tion that the data are g-exponentially distributed. Code 
implementing the MLE for g-cxponcntials is available at 



http : //bactra. org/research/tsallis-MLE/, written 
in R, a free, open-source programming language for sta- 
tistical computing (http://www.r-project.org/). This 
code also calculates probabilities and quantiles, generates 
random numbers, etc. 

a. Reparameterization as Generalized Pareto Distri- 
butions While it is possible to find the MLE for q- 
cxponcntials in the form given in Eq. [TJ the algebra is 
needlessly messy. It is simpler to reparameterize, and 
change back to the original parameter system at the end, 
if desired. (Under a 1-1 change of parameters, an MLE 
for the old parameters must, under the transformation, 
be an MLE for the new parameters, and vice versa.) 
Thus, define the new parameters 9 = and u = 9k, 

from which the original parameters can be recovered: 
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In the new parameter system, the survival function be- 
comes 



PeA x >x) = (1 + x/a) 
Hence the probability density is 



Pe,a( x ) = -(! + x l a ) 
a 



-o 



(3) 



(4) 



The code mentioned above uses both parameterizations. 

Y has a Pareto distribution with scaling exponent a 
and cut-off yo if p(y) = when y < yo, and otherwise 
p(y) oc (y/yo) a . Hence when X has a ^-exponential 
distribution, 1 + x/a has a Pareto distribution with cut- 
off 1 and scaling exponent 9. Following the classification 
given in Arnold's monograph on Pareto distributions [ic| , 
this is an instance of a "type II generalized Pareto" , often 
used in operations research on failure times and other 
reliability problems, the standard form of which is P(X > 
x) = [1 + (x — a ■ The g-exponentials come from 

taking /i = and a = 9] the ordinary Pareto distribution 
is recovered by taking a = xq and fi — a. According 
to Ref. [H pp. 13-14, 208-2101, the type II generalized 
Pareto was introduced in Refs. [ill, E EH > and the latter 
two also derived the MLE. 1 The calculations below are a 
special case of their results, except for the treatment of 
censoring, which may be new. 
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1 Arnold lid p. 48] observes that a mixture of exponentials can 
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b. Derivation of the MLE for q- Exponentials Un- 
der the g-exponential model with parameters 9, a, the 
log-probability density of a sequence of independent, 
identically-distributed samples X\ — x\, Xi = xi, 
. . . X n = x n , for short Xf = x\, is 

\ogPe,a(xi) = -nloga + n\og9 (5) 

n 

-(d + 1)J2 log l+Xi/a 

i=l 

= 1(6,*) , (6) 



the data: 



the log-likelihood of the parameter combination 9, a. 

To find the MLEs, take the first derivatives of the log- 
likelihood with respect to the parameters and set them 
equal to zero. First, the shape parameter 9: 
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-^logl + xja 

i=l 

n 

T^logl + Xi/a 
,i=l 

Similarly for the scale parameter a: 

n 9 + 1 Xi 

1 ; — > 

a a 2 1 + Xi a 

2 — 1 ' 

+ 1 Xj 

n ^ 1 + Xi lb 

i=l 11 



(7) 
(8) 
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Eqs. [51 and ITD1 give the MLEs for 9 and a, respectively, 
if the other parameter is known. The former gives the 
value of 9 explicitly 2 , while the latter does so implicitly, 
through the solution of an equation. Implicitly-defined 
MLEs like this occur in several generalizations of the ex- 
ponential distribution, such as the ones known to physi- 
cists as "stretched exponentials" and to statisticians as 
"Weibull distributions" (after the physicist who intro- 
duced them) [13, ch. 20]. The lack of a closed form is 
only a small annoyance, since such equations can gener- 
ally be rapidly solved numerically, to a precision much 
smaller than the uncertainty inherent in the data. 

If neither 9 nor a is known (i.e., neither q nor k), then 
the simultaneous solution of Eqs. [51 and [TU1 gives the joint 
maximum likelihood estimator. Substituting the former 
equation into the latter gives a single equation in a and 



produce a type II generalized Parcto. If the distribution of X — fi, 
given Z, is an exponential with mean cr/Z, and Z has a T(a, 1) 
distribution, then X has a type II generalized Pareto distribu- 
tion with parameters /i, a and a. He assigns priority for this 
result to fllt | . It would appear to be equivalent to C. Beck's 
"superstatistics" approach to Tsallis statistics (reviewed in [13 ). 
2 Cf. the well-known MLE for the scaling exponent in a Pareto 
distribution [TJ, El UJ , a = n/ [£i=i log as/aro] ■ 




log 1 + Xi I a 



1 



Xi/a 



(11) 



This does not seem to simplify, but, again, can be solved 
numerically. (Eq. [11] is transcendental, whereas Eq. [10] 
is rational, but no worse than the equation for the MLE 
of the Weibull distribution, which also contains a sum 
of logarithms, etc.) Substituting the solution into Eq. [5J 
gives 9, and then Eq. [2] give q, k. 

c. Accuracy and Precision of the MLE An estima- 
tor ip(Xi) of a parameter ip of a statistical distribution 
is consistent when ip converges in probability to ip, i.e., 
for any e > and any 5 > 0, for sufficiently large n, 



(||vi(Xf)-^ 



> e J < 5. In other words, a consistent 

estimator is "probably (1 — 8) approximately (e) correct", 
for arbitrarily small 8 and e. Under quite general condi- 
tions, met here, maximum likelihood estimators are con- 
sistent [ll]. 

Consistency alone is not enough to calculate standard 
errors or confidence regions. However, under conditions 
only mildly more restrictive than those needed for con- 
sistency, MLEs are asymptotically normal and unbiased. 
That is, tp(Xf) — -0 has, for large n, a multidimensional 
Gaussian distribution with mean zero and covariance ma- 
trix (l/n)/ _1 (-0), where I(tp) is the Fisher information 
matrix, 
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d 2 logpipjx] 



p^(x)dx 



(12) 



By the famous Cramer- Rao inequality [19(, any consis- 
tent unbiased estimator has a covariance at least equal 
to / _1 (^); the MLE is asymptotically efficient because it 
attains this bound. Since the true value of ifi is unknown, 
I(ip) cannot give us standard errors or confidence regions, 
but I(i/j) is a consistent estimator of I(ip), and can be 
used for those purposes. Another consistent estimator of 
the Fisher information is the observed information ma- 
trix, Jij(ip) = —n~ 1 d 2 £(ip)/dif)idipj, and J(i>) also gives 
asymptotically-correct error estimates. Ref. [13] treats 
these standard results in detail. 

For g-exponential distributions, it is easy to verify that 
the standard conditions for the asymptotic normality of 
the MLE hold. In the 9, a parameterization, simple but 
lengthy calculus yields 



1(9, a) 



(13) 



Either 1(9, a) or the observed information matrix could 
be used to find standard errors and Gaussian confidence 
regions. Propagation of errors can then carry these to 
estimates on q and k. 

For small samples, asymptotic approximations should 
be avoided in favor of parametric bootstrapping [2ll sec. 
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9.11]. Having obtained an estimate ip — ^(a;™), make 
up a "bootstrap" sample of random numbers Yi, Y 2 , ...Y n 
with the density p^, and calculate Tp(Yp). The distribu- 
tion of tpiY") — if) is approximately the same as that of 
i/)(Xi)—ip, so by taking many bootstrap samples one can 
estimate standard errors and confidence regions, without 
making Gaussian approximations. (For more on boot- 
strapping, see, e.g., [HI, ch. 8].) The code mentioned 
above finds bootstrapped biases, standard errors and con- 
fidence intervals. 

d. Censored Data In many applications, only mea- 
surements exceeding some known lower threshold xq are 
available, i.e., only values of X > xq become data. Pa- 
rameters estimation from such left-censored data must 
take account of the threshold. Specifically, rather than 
maximizing the unconditional likelihood, £(9, a), one 
should maximize the likelihood conditional on being in 
the right tail, £q(S, cr, xo). It is easily shown that the 
censored density is when x < xq, and otherwise 

P$,<7,x (x) = (1 + xq/cj) pe,a{x) (14) 

Pe,a{x) being given by Eq. 2J The censored likelihood 
thus equals £(0, a) plus a term involving only 0, a and 
x a : 



e c (0,<y,x o ) =£(0,a)+n0\ogl + x o /a 

The likelihood estimating equations become 

n -l 

1 + Xi/cr 
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1 + xo/ac 
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■ Xi/ac 



(15) 
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Eqs. [TC] and [T7] reduce to Eqs. [5] and [TU] when xq = 
(no censoring), and can be solved in the same way. The 
MLE remains consistent, and asymptotically normal and 
efficient. The Fisher information matrix, after an even 
longer calculation, ends up being 1(0, a + xq); explicitly, 



lc(0, cr, Xq) = 



(0 + l)(<7+X O ) 

____________ ^ 

"(0+l)(<H-xo) (a+x a )' A (B+2) 



1 



(18) 



Bootstrapping, however, is even more strongly recom- 
mended than with uncensored data. Simulated values 
should be drawn from the tail only. 

e. Comparison to Curve-Fitting Hitherto, attempts 
to estimate the parameters of g-exponential distributions 
have been based on curve-fitting. (Ref. [12] is an unusu- 
ally careful example.) Taking the log of both sides of Eq. 
® 



\ogP e A x >x) = -9log (1 + x/a) 



(19) 



Write S n (x) for the empirical distribution function, i.e., 
the fraction of points in x\,x 2 , . . .x n which are > x. 
Then we expect that, at least for large sample sizes n, 
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(20) 



Sample size 

black=curve-fitting, blue=MI_E, 500 replicates at each sample size 



FIG. 1: (Color online) Comparison of estimates of q using the 
MLE and curve-fitting. All data generated using q = 4/3, k = 
200/3 (9 = 3,cr = 200), with varying sample sizes, and 500 
independent replications are each sample size. Curve-fitting 
estimates are plotted in black, displaced slightly to the left, 
and MLEs in blue, displaced to the right. Solid bars show 
the 5th and 95th percentiles of sample estimates, circles the 
median estimate, and dashed lines the sample extrema. Note 
that the MLE is always less biased and more precise. 



for all Xi. A least-squares approach to estimating the 
parameters minimizes the squared difference between the 
two sides of Eq. [20l summed over all Xi, i.e., 

(0, or) = argminV (logS^) + 0\og (l + — ) ) 

( 21 ) 

It is possible to show that this estimator is consistent. 

The analogous procedure for Pareto distributions was 
the one originally used by Pareto in the 1890s [lCJ, and 
still widespread in physics. For Pareto distributions, 
however, statisticians have known since the 1950s that 
such estimation-by-regression is much more biased, and 
much less precise, than the maximum likelihood estima- 
tor [l[tll5j. (In particular, the standard errors are much 
larger than blind use of the ordinary regression formu- 
las suggest.) The same is true of the least-squares esti- 
mate of q-exponentials (Fig.[I}. Fitting curves to binned 
estimates of the probability density, rather than to the 
cumulative distribution, is even less accurate. Neither 
approach should be used. 

/. Validation All the claims of consistency, effi- 
ciency, etc., made above assume that the data really do 
come from a q-exponential distribution. In statistical ter- 
minology, the assumption is that the q-exponential model 
is correctly specified, as opposed to being mis-specified. 
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In applications to empirical data, it is crucial to check 
this assumption. Rigorous m is-sp ecification tests are too 
complicated to go into here [23l . l24j | , but some remarks 
are in order. 

The most common test of specification in the literature 
on Tsallis statistics is to look at the fraction, R 2 , of the 
variance in log S n accounted for by the fitted distribu- 
tional curve. Unfortunately, this popularity is not based 
on any reliability; it is easy to construct examples where 
1+x/a has, say, a log-normal distribution, but R 2 is al- 
ways close to 1 . Rather than looking at R 2 , one should ei- 
ther test g-exponentials against alternative distributions 
such as the Pareto, the log- normal, etc., or do general 
goodncss-of-fit tests, adjusting for the way parameters 
are estimated from the data [2l|, ch. 10]. The latter must 
be interpreted with caution: failing a goodness-of-fit test 
provides strong evidence against a model, but passing one 
may give only very weak evidence in its favor, depending 
on the severity of the test (2f| . 

Two heuristic checks for mis-specification deserve men- 
tion. One compares the parametric bootstrap, described 
above, with a non-parametric bootstrap, in which the val- 
ues Y\, . . . Y n come from resampling the data x\, . . .x n 
with replacement, not from the fitted distribution. If 
parametric and non-parametric bootstrap estimates of 



bias, standard error, etc., differ substantially, this is a 
sign that the model is mis-specified. Similarly, if the ex- 
pected Fisher information at the MLE, 1(9, a) is very 
different from the observed information, J (9, a), this 
again suggests the statistical model poorly describes the 
data-generating process. The comparison of informa- 
tion matrices can be turned into a formal test for mis- 
specification (23j . 

Conclusion Tsallis q-exponentials are legitimate pos- 
sible models of heavy-tailed data. Under other names, 
they have been so used in operations research and statis- 
tics for half a century, without any entropic origin story. 
To model data with g-exponentials, their parameters 
must be estimated accurately. The estimators currently 
used by physicists are inferior to the MLE, which is 
asymptotically efficient. If physicists want to describe 
data with g-exponentials, they should stop fitting curves 
and start maximizing likelihoods. Whether using Tsal- 
lis statistics is a good idea in the first place is another 
matter, beyond the scope of this note. 
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